Molecular & Cellular Proteomics — Latest Matching Preprints

1

onsite: An Integrated Framework for Phosphosite Localization and False Localization Rate Estimation

Yue, Q.-X.; Wei, Z.; Dai, C.; Bai, M.; Perez-Riverol, Y.; Sachsenberg, T.

2026-07-11 bioinformatics 10.64898/2026.07.08.737157 medRxiv

Top 0.2%

11.8%

Show abstract

With the rapid development of mass spectrometry-based proteomics, the volume of phosphoproteomic data has increased substantially. However, accurate localization of phosphorylation sites and standardized statistical validation remain critical analytical bottlenecks. To address the lack of standardized cross-algorithm evaluation, we introduce onsite, a unified and open-source Python framework. onsite integrates an alanine-decoy strategy to estimate the false localization rate (FLR) across three algorithms: AScore, PhosphoRS, and pyLucXor. This modular architecture efficiently processes large-scale datasets and enables global FLR calculation. Benchmarking on the standard synthetic phosphopeptide dataset PXD000138 highlighted distinct inter-algorithmic variations. Using the same 5% global FLR threshold, pyLucXor localized the most target sites (28,353). It also reached a high accuracy (91.22%) against the known ground truth, resulting in the largest number of correctly localized sites (25,865). Reanalysis of the highly fractionated, large-scale PXD012255 dataset further demonstrated that native integration of onsite into the quantms pipeline enables scalable processing and provides a standardized framework for FLR control in large-scale phosphoproteomics. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=64 SRC="FIGDIR/small/737157v1_ufig1.gif" ALT="Figure 1"> View larger version (14K): org.highwire.dtl.DTLVardef@e4c85dorg.highwire.dtl.DTLVardef@1e8464org.highwire.dtl.DTLVardef@185cea1org.highwire.dtl.DTLVardef@1c0d1bc_HPS_FORMAT_FIGEXP M_FIG C_FIG

2

Complementary Single-Cell Microflow HILIC and Ion Pair LC-MS Reveal Bystander Metabolic Effects in a Macrophage Model of Tuberculosis

Cook, A.; Deshpande, R.; Ellis, A. E.; Sheldon, R.; Davison, C.; Pascoe, J.; Bird, S.; Beste, D. J.; Bailey, M.

2026-06-23 microbiology 10.64898/2026.06.22.733771 medRxiv

Top 0.3%

8.2%

Show abstract

Single-cell metabolomics remains analytically challenging due to the low abundance and chemical diversity of metabolites in individual cells. We have developed complementary microflow HILIC and ion pair LC-MS methods to expand metabolite coverage in single macrophages. Ion pair LC-MS was applied to single cells for the first time, enabling retention of highly polar and ionic metabolites that elute early under conventional reversed-phase conditions. Across Mycobacterium bovis BCG infected, uninfected bystander, and control unexposed THP-1 macrophages, both microflow methods detected significantly more features than a previously reported analytical-flow HILIC method. The two microflow methods provided complementary chemical space, together yielding 633 unique named metabolites with MS2 spectra. This depth enabled pathway-level interpretation at single-cell resolution, revealing infection-associated changes in purine-, arginine-, glutathione-, and one-carbon folate-associated metabolism. Metabolite-level interrogation indicated shared purine and amino acid changes in both infected and neighbouring macrophages, while revealing a distinct bystander phenotype characterised by elevated glycine and heterogeneous ATP levels. Finally, we demonstrate sequential IP and HILIC analysis of the same single cell, establishing a route toward maximal coverage from individual cells. These results position microflow HILIC and IP LC-MS as powerful, orthogonal strategies for advancing single-cell metabolomics and unveiling heterogeneity within complex biological microenvironments. Table of Contents O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=78 SRC="FIGDIR/small/733771v1_ufig1.gif" ALT="Figure 1"> View larger version (24K): org.highwire.dtl.DTLVardef@d0d02dorg.highwire.dtl.DTLVardef@11364d5org.highwire.dtl.DTLVardef@40e1a1org.highwire.dtl.DTLVardef@19d24a5_HPS_FORMAT_FIGEXP M_FIG Figure made in BioRender. C_FIG

3

Community Resource: A Genome-Based Extension of Large-Scale Wheat Proteogenomics

Vincent, D.; Appels, R.

2026-07-08 plant biology 10.64898/2026.06.17.733048 medRxiv

Top 0.3%

8.1%

Show abstract

Bread wheat (Triticum aestivum L.) possesses a large and highly repetitive allohexaploid genome and annotation requires extensive protein-level validation. We developed a genome-based wheat proteogenomics workflow integrating large-scale MS/MS reanalysis, GFF3-based peptide coordinate reconstruction, thorough validation, and genome browser-compatible peptide deployment against the IWGSC RefSeq v2.1 reference genome. Public wheat proteomics datasets comprising 577 raw mass spectrometry files ([~]1.0 TB) from 32 tissues were reprocessed using FragPipe/MSFragger, generating 2,226,779 non-redundant peptides and 1,648,740 unique protein accessions. Peptide-to-genome projections using GFF3 annotation files produced 8,291,056 genomic peptide projected rows, of which 98.14% passed validation procedures. Overall, peptide evidence supported 103,095 high-confidence (HC) and 135,495 low-confidence (LC) wheat gene models, corresponding to 96.4% and 84.7% of all parsed HC and LC annotations, respectively. In total, 238,590 wheat gene models (89.4% of all parsed annotations) received protein-level support. Apollo/JBrowse-compatible BED tracks enabled exon-resolved visualisation of peptide evidence across wheat chromosomes. Together, this study establishes a scalable GFF3-based proteogenomics framework for complex polyploid plant genomes and provides an extensive community resource for wheat genome annotation refinement and visual exploration (https://bread-wheat-um.genome.edu.au/apollo/49826/jbrowse/index.html). Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=63 SRC="FIGDIR/small/733048v2_ufig1.gif" ALT="Figure 1"> View larger version (16K): org.highwire.dtl.DTLVardef@6e797org.highwire.dtl.DTLVardef@14ea4fdorg.highwire.dtl.DTLVardef@31f027org.highwire.dtl.DTLVardef@8d908a_HPS_FORMAT_FIGEXP M_FIG C_FIG

4

Contextualised real-time mass spectrometry improves glycosylation detection and characterisation

Kelly, M. I.; Ashwood, C.

2026-07-03 biochemistry 10.64898/2026.07.03.736344 medRxiv

Top 0.3%

8.1%

Show abstract

Glycosylation is a structurally diverse, non-template-driven modification whose analysis by liquid chromatography-mass spectrometry is constrained by discovery-mode acquisition rules developed for proteomics. Data-dependent acquisition filters, such as intensity-based precursor selection and charge-state exclusion, map poorly onto glycan analysis, which span wide ranges of charge state and abundance independent of their biological importance. Here we present glycosylation real-time mass spectrometry (GlycoRTMS), an instrument-API method that annotates observed precursor masses with glycan compositions in real time and uses this context to guide fragmentation. Composition-aware precursor prioritisation sampled deeper into the precursor space, expanding MS2 coverage of a hyaluronic acid hydrolysate from four to eight oligosaccharide subunits. Charge-state-specific collision energy equations tailored to oligosaccharides produced complete fragment ladders where fixed normalised collision energy did not. MS3 triggering gated by both diagnostic ions and glycan composition matching enabled efficient, chromatography-compatible characterisation of O-acetylated sialic acids and identified product ions specific to O-acetylation. Together, these strategies improve both the depth and quality of glycan detection and characterisation within a single injection.

5

Defining Quality Control Standards for Single-Cell Proteomics by Inter-Laboratory Benchmarking

van Puyenbroeck, S.; Claeys, T.; Seth, A.; Rijal, J.-B.; Keller, C.; Lin, L.; Mayer, R.; Matzinger, M.; Han, I.; Aragon Fernandez, P.; Petrosius, V.; Boyle, B.; Rivera, K.; Tourniaire, G.; Rosenberger, F. A.; Martens, L.; Carr, S. A.; Dong, Z.; Vegvari, A.; Carapito, C.; Kelly, R.; Mechtler, K.; Budnik, B.; Schoof, E. M.; Ctortecka, C.

2026-07-14 biochemistry 10.64898/2026.07.13.738155 medRxiv

Top 0.3%

8.1%

Show abstract

Single-cell proteomics can quantify thousands of proteins from individual mammalian cells, yet the absence of community-wide quality control limits biological interpretability. Here, the HUPO Single Cell Initiative presents the first inter-laboratory single-cell proteomics benchmarking study across seven laboratories using standardized 384-well plates acquired on Orbitrap Astral and timsTOF Ultra2 instruments. Centralized analysis across six DIA software tools revealed that software choice impacts identification depth and quantitative accuracy more than instrument vendor. Multi-layered quality control enabled the detection of cell-leakage during sorting, LC misconfiguration, column degradation and site-specific pipetting failures. Inter-lab quantitative correlations were strongest between instruments of the same vendor relative to cross-platform comparisons. Sequential correction for plate identity and well position recovered clean cell-type separation for confident downstream differential expression analysis. This study provides a data-driven quality control framework spanning plate design to batch correction for reproducible single-cell proteomics across laboratories and platforms.

6

Systematic optimization and benchmarking of synchro-PASEF for high-throughput phosphoproteome profiling

Brademan, D.; Mullarkey, A.; Greeson, M.; Szvetecz, S.; Vitek, O.; Blythe, E.; Huttenhain, R.

2026-06-27 biochemistry 10.64898/2026.06.26.734570 medRxiv

Top 0.3%

7.7%

Show abstract

High-throughput data-independent acquisition (DIA) workflows paired with short chromatographic separations are increasingly adopted for systems biology and clinical proteomics. However, narrower peak widths from rapid separations demand faster mass spectrometer cycle times to maintain quantitative depth and reproducibility. The synchro-PASEF acquisition mode on timsTOF mass spectrometers diagonally scans across ion mobility and m/z space, enabling efficient sampling of the precursor ion cloud with shortened cycle times. While synchro-PASEF has demonstrated competitive identification depth for global protein abundance samples compared to conventional dia-PASEF, its performance for phosphoproteomics - where the precursor ion cloud is characteristically broader and bimodally distributed - has not been evaluated. Here, we systematically optimized synchro-PASEF methods for phosphoproteomics and benchmarked performance against two dia-PASEF methods across three sub-hour separations. We found that synchro-PASEF performance depends critically on balancing diagonal window number, total isolation width, and gradient length, with longer gradients favoring more windows for selectivity and shorter gradients favoring fewer windows to preserve sampling frequency. An optimized configuration quantified over 19,000 localized phosphosites using a 23-minute separation. Retention time summation (RTsum) with a factor of 2 increased phosphopeptide identifications by 5-20% and reduced phosphosite-level coefficients of variation by up to 30% across all dia-PASEF and synchro-PASEF methods tested. Using {beta}2-adrenergic receptor (B2AR) activation as a signaling model, we demonstrate that label-free DIA phosphoproteomics can be used to model phosphoproteomics dose-response relationships, showing that synchro-PASEF and dia-PASEF produce highly concordant phosphoproteomic responses, with comparable numbers of responding phosphosites, similar effect sizes, and nearly identical predicted protein kinase A (PKA) substrates downstream of the activated B2AR. While synchro-PASEF did not surpass optimized dia-PASEF in identification depth, its comparable biological performance and amenability to post-acquisition optimization through RTsum support its utility for high-throughput phosphoproteomics. This work provides a transferable framework for synchro-PASEF method optimization and demonstrates the broad utility of retention time summation for PASEF-based phosphoproteomics workflows.

7

Learning Fragmentation Physics or Exploiting Sequence Priors? Benchmarking Bias in Deep Learning Models for De Novo Peptide Sequencing

Li, J.; Rost, H.

2026-06-29 bioinformatics 10.64898/2026.06.23.734131 medRxiv

Top 0.4%

5.8%

Show abstract

Deep learning models have advanced de novo peptide sequencing, but their predictions may reflect both physics-based spectral evidence and learned peptide-sequence priors. Systematically measuring such prior-associated behavior is important for benchmarking model robustness beyond conventional proteomics data. Here, we introduce the Prior Bias Index (PBI), a general framework for measuring the extent to which model behavior shifts toward prior-associated reference patterns under controlled conditions, and implement it as DeNovo-PBI, a benchmark for quantifying prior bias in de novo peptide sequencing models. DeNovo-PBI combines benchmark dataset construction, in silico sequence and spectral perturbation workflows, PBI-based metrics, and analysis algorithms to evaluate three forms of prior-associated behavior: sequence-distribution dependence, database amino-acid-pair order preference, and mutation-group prediction consistency under shared sequence context. In addition to experimentally acquired peptide spectra, we generated in silico spectra from random, natural, and mutated peptide sequences and selectively removed fragment ions that distinguish N-terminal residue orders. Across these assays, deep learning models showed peptide-sequence-distribution-dependent performance and strong directional amino-acid-pair order preferences even when order-diagnostic spectral evidence was removed. DeNovo-PBI provides a quantitative benchmark for measuring, comparing, and interpreting learned bias in de novo peptide sequencing models.

8

Development of an Ethylenediaminetetraacetic Acid-Enhanced Deep Proteomic Profiling Method for Dried Blood Spots and Its Application in Mouse Disease Models

Nakajima, D.; Kanno, T.; Okuda, Y.; Mitsui, H.; Konno, R.; Ueyama, N.; Endo, Y.; Ohara, O.; Kawashima, Y.

2026-07-14 molecular biology 10.64898/2026.07.13.738354 medRxiv

Top 0.4%

5.5%

Show abstract

Dried blood spots (DBS) are well-established microsamples used in clinical testing and newborn screening. However, their use in deep proteomics is hindered by highly abundant blood proteins and inefficient protein recovery from filter paper matrices. The non-targeted analysis of non-specifically DBS-absorbed proteins (NANDA) workflow partially overcomes the impact of abundant blood proteins and has enabled the identification of over 5,000 proteins from DBS samples. Nonetheless, residual abundant proteins, including hemoglobin and fibrinogen, constrain deep proteomic analysis. Therefore, this study aimed to evaluate the effects of the metal chelator ethylenediaminetetraacetic acid (EDTA) on the depth of DBS proteomic analysis. An optimized EDTA-enhanced NANDA protocol that incorporated a 100 mM EDTA wash step was compatible with standard DBS collection procedures and required no modification of current clinical workflows, markedly enhancing the depletion of abundant proteins and facilitating its potential use in clinical and translational settings. When combined with Orbitrap Astral data-independent acquisition mass spectrometry, this approach enabled the single-shot identification of more than 7,000 proteins from DBS samples; to the best of our knowledge, this represents the deepest proteome coverage reported to date, and the workflow further supported high-throughput and highly reproducible analyses. Additionally, its application to mouse disease models revealed disease-specific systemic immune signatures from minimal blood volumes. Collectively, these results establish EDTA-enhanced NANDA as a practical and scalable workflow that overcomes longstanding limitations of DBS proteomics, thereby enabling deep, high-throughput, minimally invasive proteomic profiling across diverse biological and experimental contexts.

9

Spatial Metabolomics by Desthiobiotin Ligase (DESTNI) in Live Cells

Yoo, C.-M.; Jo, J.-Y.; Choi, C.-R.; Park, Y. S.; Cha, Y. J.; Jung, S.; Kang, J.; Kim, J.; Kang, Y. P.; Yoo, T. H.; Kim, J.-S.; Rhee, H.-W.

2026-07-06 biochemistry 10.64898/2026.07.05.736296 medRxiv

Top 0.4%

5.5%

Show abstract

Proximity labeling has transformed spatial proteomics by enabling compartment-resolved mapping of protein environments in living cells, yet its extension to small-molecule metabolites has not been demonstrated, probably due to limitations in labeling chemistry and identification of labeled metabolites. Here, we introduce DESTNI, an engineered desthiobiotin (DTB) ligase derived from TurboID through directed evolution, and establish a platform for spatially resolved profiling of amine-containing metabolites. A directed evolution strategy based on a yeast display system yielded DESTNI with an efficient DTB-dependent reactivity, enabling robust and compartment-specific proximity labeling across diverse subcellular environments. To identify the DTB-modified amino metabolome, we developed an integrated analytical framework combining DTB-modified amino metabolite standards, in vitro DESTNI profiling, and in silico MS/MS prediction, enabling systematic annotation of DTB-modified amino metabolites. To extend this chemistry to metabolites, we combined synthetic DTB-conjugated metabolite reference standards, in vitro DESTNI-reactive metabolite discovery, and machine-learning prediction of DTB-derivatized metabolites and oligopeptides. Organelle-targeted DESTNI recovered reproducible compartment-enriched amino metabolite signatures, including mitochondrial matrix-enriched glycine, 5-aminolevulinic acid, ornithine and spermidine adducts, as well as nuclear-enriched {gamma}-aminobutyric acid and 5-aminovaleric acid adducts. Together, this work establishes DESTNI as a proximity labeling platform that bridges spatial proteomics and metabolomics and provides a general strategy for mapping subcellular biochemical environments in living cells.

10

In situ identification of substrates of the protein tyrosine phosphatase PTP1B using site-specific photo-crosslinking

Johns, A. C.; Goonatilleke, Y. S.; Cabanero, D. C.; Ma, Y.; Lee, M.; van Vlimmeren, A. E.; Jovanovic, M.; Shah, N. H.

2026-07-07 biochemistry 10.64898/2026.07.06.736850 medRxiv

Top 0.4%

5.0%

Show abstract

Protein tyrosine phosphorylation is critical for cellular function, and aberrant phosphorylation is tied to a wide range of human diseases. Identifying the substrates of protein tyrosine phosphatases, the enzymes that erase this modification, is critical to understanding human biology and disease states. The state-of-the-art method for tyrosine phosphatase substrate identification requires the use of mutations that modestly increase the lifetime of enzyme-substrate complexes by kill catalytic activity. While these substrate-trapping mutants are useful tools, they work best for high-affinity or abundant substrates that remain phosphatase-bound through cell lysis and enrichment. Here, we use site-specific photo-crosslinking to covalently capture the substrates of tyrosine phosphatases in situ. We identify eight different positions around the active site of the phosphatase PTP1B where photo-crosslinker amino acids can be incorporated via amber codon suppression without dramatically disrupting catalytic activity. We then conduct photo-crosslinking experiments in mammalian cells and identify crosslinked proteins by mass spectrometry proteomics, revealing that our approach can capture known PTP1B interactors and substrates. We then show that PTP1B photo-crosslinking in situ is sensitive to enzyme localization and identify new PTP1B substrates that regulate contacts between the endoplasmic reticulum and plasma membrane. We also demonstrate that photo-crosslinking can capture signal-dependent interactions. For example, we observe PTP1B crosslinking to the epidermal growth factor (EGF) receptor, a known substrate, in an EGF stimulation-dependent manner, and we identify other potential EGF-dependent substrates. Overall, our approach reveals previously unknown roles of PTP1B in signaling systems and could be readily extended to other tyrosine phosphatases in the same family.

11

Enhanced proteome relative quantification using refined quantotypic spectral libraries

Barnes, B. A.; Alharbi, H.; Unwin, R.

2026-07-10 bioinformatics 10.64898/2026.07.06.736793 medRxiv

Top 0.4%

4.9%

Show abstract

Plasma proteomics is used for a variety of applications including biomarker discovery, disease monitoring, and drug development. Data-independent acquisition (DIA) has vastly improved the breadth of proteins that are identified from samples; however, given challenges in reproducibility and translation, it is critical that the quantitative performance of these methods is reliable. Analysis of global proteomics data typically incorporates information from all detected peptides. However, some peptides do not reflect their parent protein amount, due to irreproducible digestion, modification, analytical interferences or instability. We hypothesise that including these peptides impacts protein relative quantification, and thus, a refined spectral library containing only quantitatively representative peptides provides superior protein quantification. By analysing a defined multi-species spike-in model, we show that refining a plasma spectral library by removing precursors that fail to meet quality control metrics (25.4% of all identified precursors) reduces noise and variability, improving precision, accuracy and differential abundance analysis by up to [~]11%, with minimal identification losses and substantial reduction in computational demand. This demonstrates proof-of-concept that refining spectral libraries produces results that prioritize quantification quality over quantity. This approach could enable development of universal tissue-specific refined spectral libraries able to improve quantification quality with easy implementation and minimal processing time. Significance of the StudyAs DIA mass spectrometry proteome depth increases, the quality of the associated protein quantifications must be considered alongside identification breadth, particularly in complex matrices such as plasma, which presents additional technical challenges. The spectral library used for protein identification and quantification is a critical determinant of DIA performance, and its composition requires considerable consideration. This work illustrates an initial step toward improving protein quantification starting at the spectral library level by filtering precursors which are poor quantitative representatives of their parent proteins. In doing so, the resulting data is more reliable for downstream and biological interpretation, with fewer false differential abundance assignments and reduced quantitative noise. As such, this work represents a broader shift away from the habitual focus of MS workflows on maximising the number of protein and differential abundance identifications and instead prioritises the quality of quantification over quantity. These initial findings lay the groundwork for further development of spectral library refinement strategies, with the potential to continue improving the accuracy and precision of protein quantification in DIA-based proteomics.

12

MassSpectrum Analyzer: An interactive platform for proteomic searching parameter refinement and peptide modification focused re-scoring

Karlic, K. I.; Scott, N. E.

2026-06-28 bioinformatics 10.64898/2026.06.22.733873 medRxiv

Top 0.4%

4.8%

Show abstract

Peptide spectrum annotation is critical for the assignment of peptides and the localisation of modifications. While many existing tools provide spectrum annotation capacities, they often lack the flexibility required to allow bespoke spectral annotation of peptides containing multiple labile modifications or the accurate assignment of peptides in which fragmentation deviates from canonical patterns. In these cases, user-guided annotation is widely used to improve assignment completeness, however it typically does not integrate peptide scoring, making it challenging to assess the empirical improvement of the associated annotation and its impact on downstream false-discovery rate estimations. Here, we introduce an interactive annotation environment, the 'MassSpectrum Analyzer', which aims to streamline the exploration and analysis of modified peptides by enabling user-defined customisation with peptide scoring. Using (2-Aminoethyl)trimethylammonium carboxyl-derivatised peptides and glycopeptides as case studies we demonstrate the capacity of the MassSpectrum Analyzer to rapidly explore and allow the assessment of modified peptide datasets. By enabling direct assessment of the impact of user-guided choices on peptide scoring, we show how the detection of highly modified peptides can be improved through post-search integration of modification fragmentation information in a statistically robust manner. Similarly, by permitting comparisons of peptide ion intensities across spectra, we show that global fragmentation patterns can be quantified allowing the interrogation of trends that only become clear when spectra are assessed en masse. Combined, the MassSpectrum Analyzer streamlines the generation of publication-ready spectra and provides a means to assess how the inclusion of annotated features influences assignment scores.

13

Sortase-mediated enrichment of ubiquitinated proteins from complex samples

Raniszewski, N.; Beckley, K.; Hintzen, J.; Noel, M.; Burslem, G.

2026-07-01 biochemistry 10.64898/2026.06.29.735432 medRxiv

Top 0.5%

4.4%

Show abstract

Despite its importance in cellular signaling and protein fate, the detection of protein ubiquitination in proteomics experiments presents many challenges for researchers. Importantly, current techniques that often rely on antibodies specific for lysine sidechain modifications may miss non-canonical ubiquitination sites in experiments. We envisioned a strategy that uses sortase, a bacterial transpeptidase enzyme, to selectively modify ubiquitination sites with a Biotin tag for enrichment and downstream proteomics experiments. In this work, we demonstrate our ability to selectively modify N-terminal diglycine remnants in digested proteins with a Biotin-modified peptide, enabling downstream enrichment of previously ubiquitinated proteins. We show this proof of concept on several recombinant proteins, revealing a site of autoubiquitination in the E2 conjugating enzyme Ubc13. We show that elution of the enriched peptides can be achieved by using common guanidinium elutions or by leveraging the reversibility of sortase. Finally, we include a bifunctional peptide that is labile to trypsinization to better streamline this strategy for downstream proteomics approaches. We envision that this approach will provide an accessible strategy for the detection of ubiquitinated proteins in proteomics experiments, with the goal of enabling researchers to better detect noncanonical protein ubiquitination.

14

Analyte-Class-Dependent Electrophoretic Organization Enables Single-Run Proteome-Metabolome Analysis by CE-ESI-MS

Kumar, R.; ONeal, R. M.; Nemes, P.

2026-07-03 biochemistry 10.64898/2026.07.03.736334 medRxiv

Top 0.5%

4.2%

Show abstract

Dual proteome-metabolome measurements from limited samples typically require sample splitting or sequential analyses using electrospray ionization mass spectrometry (ESI-MS). Here we show that capillary electrophoresis (CE) can avoid that tradeoff by organizing predominantly singly charged small molecules and multiply charged peptides into partially resolved, analyte-class-dependent regions of migration time-m/z space. Leveraging this intrinsic electrophoretic organization together with charge- and m/z-resolved precursor selection, we developed a single-run CE-ESI-MS workflow that combines single-vial sample processing with class-resolved tandem MS acquisition. In a HeLa digest spiked with 17 amino acids, the integrated analysis detected all amino acids while preserving proteomic depth relative to a dedicated proteomics run, yielding 1,221 versus 1,227 cumulative protein groups. Applied to identified single Xenopus laevis blastomeres, the method provided matched readouts of 86 metabolite features together with 1,097 and 1,083 protein groups from D1.1 and V1.1 cells, respectively. The paired measurements resolved cell-type-dependent molecular differences and mapped protein and metabolite changes into shared pathway context. These results establish analyte-class-dependent electrophoretic organization coupled to class-resolved MS acquisition as an analytical basis for single-run proteome-metabolome analysis by CE-ESI-MS in material-limited samples.

15

Data Independent Acquisition Pipeline for Microbiome Samples (Microbe-DIA)

Obermiller, S. A.; Lipton, M. S.; Piehowski, P. D.; Bilbao, A.; McCue, L. A.; Prozapas, V. N.; Attah, I. K.

2026-07-14 microbiology 10.64898/2026.07.13.738261 medRxiv

Top 0.5%

4.1%

Show abstract

The functional complexity inherent in microbiomes complicates analytical approaches aimed at defining phenotype. As proteins are the functional effectors of microbiome phenotypes, improving the performance of mass spectrometry-based metaproteomics is critical to achieving the functional characterization of these systems. Data-independent acquisition (DIA) improves protein coverage and reduces data missingness when compared to data-dependent acquisition (DDA) in metaproteomics. However, the application of DIA to complex microbial systems remains constrained by analytical throughput and computational scalability. Here, we optimized LC-MS/MS acquisition parameters for both DDA and DIA using a model microbiome, demonstrating how DIA enables increased sample throughput without compromising quantitative performance. In addition, we demonstrated a computationally efficient, library-free DIA workflow that overcomes reliance on empirical spectral libraries. Our analytical and computational innovations establish a scalable and cost-effective pipeline for metaproteomics of complex microbial communities.

16

False discovery rate control for trustworthy AI-based de novo peptide sequencing

Liang, Z.; Dai, C.; Ling, T.; Yang, T.; Yang, Y.; Leng, Y.; Xie, L.; He, Y.; He, F.; Wang, Y.; Chang, C.

2026-06-29 bioinformatics 10.64898/2026.06.29.735174 medRxiv

Top 0.5%

4.0%

Show abstract

AI-based de novo peptide sequencing predicts peptide sequences from tandem mass spectra, enabling identification beyond predefined databases but leaving prediction reliability difficult to assess, particularly with respect to false discovery rate (FDR). In database search, FDR control is provided by target-decoy competition over a finite search space, whereas de novo predictions are generated in open sequence space and lack naturally matched sequence-level decoys. Here we introduce Counterpart Calibration Theory (CCT), a theory-guided framework that reframes de novo FDR control as a four-group score-ranking and threshold-selection problem over target-side predictions and matched counterpart-side comparators. Implemented in {pi}-NovoQC, CCT provides dual-level FDR control at the peptide-spectrum match and peptide levels. Across models, datasets, instruments and acquisition modes, {pi}-NovoQC achieves stable FDR control while preserving identification yield. In large-scale proteomic applications, {pi}-NovoQC recovers low-abundance in-database peptides missed by database search and provides de novo-supported protein-group and variant evidence.

17

Hidden Structural Bias in Proteomics: Sonication-induced Selective Fragmentation of Intrinsically Disordered Regions

Narita, M.; Yamakawa, T.; Nishimura, R.; Iwasaki, M.

2026-07-15 cell biology 10.64898/2026.07.14.738389 medRxiv

Top 0.5%

4.0%

Show abstract

Sonication is a fundamental technique in proteome sample preparation, primarily used for protein solubilization and shearing of genomic DNA. Although the mechanical shearing of DNA is well-characterized, its unintended impact on protein structural integrity remains a significant "blind spot" in high-throughput analytical workflows. In this study, we systematically investigated sonication-induced protein fragmentation by combining gel-based fractionation (PEPPI-MS) with sequence-level compositional analysis and bioinformatic mapping. Our results demonstrate that sonication does not significantly alter overall proteome identification or the recovery of membrane proteins; however, it induces extensive and non-random protein fragmentation. Sonication caused an approximately three-fold increase in the abundance of >45 kDa protein-derived fragments migrating into the <40 kDa fraction, and 1,620 high-molecular-weight (MW) proteins were uniquely detected in the lower-MW fraction upon sonication, an eight-fold increase over non-sonicated controls. Peptide-level amino acid composition analysis revealed subtle but directional shifts in the sonication-derived fragments. This residue-level signature is reinforced by two orthogonal structural analyses (MobiDB peptide-level mapping and protein-level profiling using metapredict V3 software), which show that sonication-susceptible proteins harbor more than twice the disordered content of length-matched controls (median 40% vs. 18%). This study identifies a previously unrecognized "structural bias" whereby intrinsically disordered region (IDR)-rich proteins are selectively compromised during sample preparation. Because these fragments are indistinguishable from enzymatic digestion products in conventional bottom-up proteomics, the underlying structural damage is effectively masked in global quantitative datasets, potentially distorting biological interpretations related to protein size, isoforms, and stability, particularly for IDR-rich classes, such as transcription factors and signaling molecules. We propose that optimizing and standardizing sonication parameters is essential for ensuring the accuracy and reproducibility of quantitative proteomic analyses.

18

Protein Aggregation Capture for Top-down Proteomics

Feltenstein, I. G.; Drown, B. S.

2026-07-03 biochemistry 10.64898/2026.07.02.736076 medRxiv

Top 0.5%

3.6%

Show abstract

Proteins are dynamically regulated by a myriad of post-translational modifications (PTMs) that control their stability, conformation, activity, subcellular localization, and local interactions. Capturing the precise composition of these various modification states, or proteoforms, is a principal objective of top-down proteomics (TDP). By ionizing intact proteoforms and combining measurements of precursor ion and fragment ion masses, the position, stoichiometry, and combination of PTMs can be determined. Despite the highly valuable measurements that TDP can provide, it is typically less sensitive than corresponding peptide-level analysis with many reports utilizing input material in the microgram to milligram range. Contributing to this lack of sensitivity is the risk of sample loss due to non-specific binding to surfaces during sample preparation. The most widely employed sample preparation approaches for TDP either require high sample input (e.g. precipitation and ultra-filtration) or fail to effectively remove surfactants (e.g. solid-phase extraction). These limitations have hindered advancement of targeted TDP applications involving immunoprecipitation and other enrichment strategies. Bead-assisted protein aggregation, also referred to as single-pot, solid-phase-enhanced sample preparation (SP3), has emerged as a popular sample preparation strategy for bottom-up proteomic workflows, but has only been used in TDP with secondary ion exchange chromatography cleanup. We envisioned a magnetic bead based protein cleanup approach that proceeds directly to MS analysis with judicious choice of bead surface chemistry and elution conditions. Here we report a sample preparation method using hydroxyl-functionalized magnetic beads for top-down proteomics applications.

19

Longitudinal proteomic module configurations differ across human monocyte-derived differentiation and polarization conditions

Navarro Quiroz, R.; Escorcia Lindo, K.; Jaruffe Pinilla, A.; Luna-Rodriguez, I. K.; Diaz-Olmos, Y.; Geribaldi-Doldan, N.; Fernandez-Ponce, C.; Zarate Penate, E.; Bello Lemus, Y.; Pacheco Lugo, L.; Pacheco Londono, L.; Acosta-Hoyos, A.; Navarro Quiroz, E.

2026-06-26 immunology 10.64898/2026.06.24.734366 medRxiv

Top 0.5%

3.5%

Show abstract

Monocytes are key innate immune cells capable of differentiating into macrophages and dendritic cells, or adopting distinct polarization states in response to microenvironmental signals. Despite advances in transcriptomics, the proteomic landscape underlying these differentiation and polarization trajectories remains incompletely characterized. Here, we performed a longitudinal proteomic analysis of human monocyte-derived macrophages and dendritic cells under multiple polarization conditions using publicly available mass spectrometry datasets. We applied weighted protein co-expression network analysis (WPCNA) to identify proteomic modules and characterized their temporal dynamics across differentiation and polarization conditions. Our results reveal that proteomic module configurations are condition-specific and exhibit distinct longitudinal trajectories, with modules enriched in immune signaling, metabolic reprogramming, and cytoskeletal remodeling pathways. Hub proteins within these modules represent potential regulatory nodes governing monocyte fate decisions. These findings provide a systems-level view of monocyte proteome organization and highlight the divergent molecular programs that underlie functional specialization during differentiation and polarization.

20

Incorporating Surfaced-Induced Dissociation Mass Spectrometry Data into an AlphaFold-derived deep learning network improves protein structure prediction

Bolz, R. M.; Day, E. H.; Drake, Z. C.; Harvey, S. R.; Wysocki, V. H.; Lindert, S.

2026-06-29 biochemistry 10.64898/2026.06.26.734850 medRxiv

Top 0.6%

3.3%

Show abstract

Surface-Induced Dissociation native Mass Spectrometry (SID-nMS) is a tandem MS activation method that yields information on the connectivity and stoichiometry of protein complexes. While insufficient for direct structure elucidation, the data derived from SID-nMS has considerable potential to inform multimeric protein structure prediction. We hypothesized that incorporating this data into a machine-learning framework could improve multimer prediction accuracy beyond that of existing deep-learning methods. To this end, we developed SIDFold, a novel AlphaFold-based deep-learning network. SIDFold is the first AlphaFold-like network to leverage experimental data during protein complex prediction, and the first deep-learning network to utilize nMS data for structure prediction. We benchmarked SIDFold on the BETA protein set, and observed an improvement in RMSD in 138 of 227 cases including 27 targets in which the predicted structure attained near-native accuracy. We then evaluated the network on 20 proteins with experimental SID-nMS data, yielding an improved RMSD in 18 cases, with five of these cases improving to a high-accuracy complex. Finally, we tested SIDFold against a previously published SID-guided Rosetta docking method, where we saw improvement in 13 of 16 proteins. SIDFold is freely available on GitHub, with example files and commands available in the Supplementary Information.